Telemetry: normalize host key for per-host client + breaker registries#364
Merged
Conversation
Both clientManager and circuitBreakerManager keyed their maps by the raw host string, so DSN variants like "example.com", "example.com/", and "https://example.com" produced separate telemetry clients and separate circuit breakers for the same logical host. That fragments trip state and defeats the per-host consolidation the managers are designed to do. Introduce normalizeHostKey (lowercase, trim whitespace, strip http/https scheme, trim trailing slashes) and use it for all registry lookups in clientManager.getOrCreateClient, clientManager.releaseClient, and circuitBreakerManager.getCircuitBreaker. The host string flowing into URL construction is unchanged — first-caller-wins, matching existing user-agent semantics documented on getOrCreateClient. Co-authored-by: Isaac
3 tasks
samikshya-db
added a commit
that referenced
this pull request
May 25, 2026
## Summary Bump `DriverVersion` to `1.12.0` and add the `v1.12.0` section to `CHANGELOG.md`. ### Notable changes since v1.11.1 - Retry transient S3 errors in CloudFetch downloads and staging PUT/GET/REMOVE operations (#355, #361) - Telemetry: normalize host key for per-host client + breaker registries; stop retrying into 429s, honour Retry-After, fix userAgent (#354, #364) - Bump dependencies to clear Go-1.20-compatible CVEs: golang-jwt, x/net, protobuf, go-jose v3.0.5 (CVE-2026-34986) (#360, #363) ## Test plan - [x] `go test ./... -count=1 -short` passes locally - [x] Multi-DBR tests in `universe/peco/correctness/go` passed (all 5 suites green against DBR 18.x-photon-scala2.13) - [ ] CI green This pull request was AI-assisted by Isaac. Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
vikrantpuppala
added a commit
that referenced
this pull request
Jun 1, 2026
The sentinel Watch loop had a race: once StatusFn reported Done, it spawned the OnDoneFn processor and looped back into its select with two potentially-ready cases at once -- the processor result (resCh) and ctx.Done(). Go's select picks randomly among ready cases, so a context cancellation arriving at the same instant as completion could win and trigger OnCancelFn, cancelling an operation that had already finished. TestConn_ExecContext/"ExecContext uses new context to close operation" cancels the context inside GetOperationStatus while returning FINISHED, then asserted cancelOperationCount == 1 -- i.e. it asserted on the losing side of that random select. The Go scheduler version only shifts the probability; the race is in the design, not the language version. The racy pattern dates to #41 (Nov 2022) and the test to #86 (Dec 2022); it was not introduced by #364. Fix: once Done is observed, the operation has logically completed and its outcome is authoritative. Drain the OnDoneFn result via a new waitForDone helper that selects only on resCh/errCh, no longer on ctx.Done()/timeout. A finished operation is now never cancelled, deterministically. Update the connection test to assert the correct deterministic behaviour (success, cancelOperationCount == 0, CloseOperation still runs on a fresh context) and add a sentinel regression test that pre-cancels the context in the same StatusFn call that reports Done. Verified: TestConn_ExecContext -count=300 green; sentinel suite green under -race -count=5. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
clientManagerandcircuitBreakerManagerkeyed their maps by the raw host string, so DSN variants likeexample.com,example.com/, andhttps://example.comcreated separate telemetry clients and separate circuit breakers for the same logical host. That fragments trip state and defeats the per-host consolidation the managers are designed to do.normalizeHostKey(lowercase, trim whitespace, striphttp(s)://, trim trailing slashes) and use it for all registry lookups inclientManager.getOrCreateClient,clientManager.releaseClient, andcircuitBreakerManager.getCircuitBreaker.getOrCreateClient.Addresses Gopal's review comment on #354: #354 (comment)... (host normalization on
telemetry/exporter.go:62).Test plan
TestNormalizeHostKey— covers scheme stripping, lowercase, trim trailing slash, whitespace, empty stringTestClientManager_HostVariantsShareClient— four DSN variants of the same host share a single telemetry client; release via a different variant still cleans upTestCircuitBreakerManager_HostVariantsShareBreaker— same coverage for the breaker registrygo test ./telemetry/...passesThis pull request and its description were written by Isaac.